Make Diagnostic Plots for NA-CORDEX Zarr Stores¶

In [1]:
import xarray as xr
import numpy as np
import intake
import ast
import dask_jobqueue
import matplotlib.pyplot as plt

from dask_jobqueue import PBSCluster

Create and Connect to a Dask Distributed Cluster¶

Run the cell below if the notebook is running on a NCAR supercomputer. If the notebook is running on a different parallel computing environment, you will need to replace the usage of PBSCluster with a similar object from dask_jobqueue or dask_gateway.

In [2]:
num_jobs = 20
walltime = '0:20:00'
memory='10GB' 

cluster = PBSCluster(cores=1, processes=1, walltime=walltime, memory=memory, queue='casper', 
                     resource_spec='select=1:ncpus=1:mem=10GB',)
cluster.scale(jobs=num_jobs)


from distributed import Client
client = Client(cluster)
cluster

PBSCluster

3c98ff67

Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/bonnland/proxy/8787/status Workers: 0
Total threads: 0 Total memory: 0 B

Scheduler Info

Scheduler

Scheduler-65da783d-70d7-4a4a-8734-6cec5f914a03

Comm: tcp://10.12.206.46:36255 Workers: 0
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/bonnland/proxy/8787/status Total threads: 0
Started: 3 minutes ago Total memory: 0 B

Workers

☝️ Link to Dask dashboard will appear above.

Find and Obtain Data Using an Intake Catalog¶

Choose Cloud Storage (AWS or NCAR Cloud)¶

In [3]:
# If True,  use NCAR Cloud Storage.  Requires an NCAR user account. 
# If False, use AWS  Cloud Storage.

USE_NCAR_CLOUD_STORAGE = False

Define the Intake Catalog URL and Storage Access Options¶

In [4]:
if USE_NCAR_CLOUD_STORAGE:
    catalog_url = "https://stratus.ucar.edu/ncar-na-cordex/catalogs/aws-na-cordex.json"
    storage_options={"anon": True, 'client_kwargs':{"endpoint_url":"https://stratus.ucar.edu/"}}
                     
else:
    catalog_url = "https://ncar-na-cordex.s3-us-west-2.amazonaws.com/catalogs/aws-na-cordex.json"
    storage_options={"anon": True}

Open catalog and produce a content summary¶

In [5]:
# Have the catalog interpret the "na-cordex-models" column as a list of values, as opposed to single values.
col = intake.open_esm_datastore(catalog_url, read_csv_kwargs={"converters": {"na-cordex-models": ast.literal_eval}},)
col

aws-na-cordex catalog with 330 dataset(s) from 330 asset(s):

unique
variable 15
standard_name 10
long_name 18
units 10
spatial_domain 1
grid 2
spatial_resolution 2
scenario 6
start_time 3
end_time 4
frequency 1
vertical_levels 1
bias_correction 3
na-cordex-models 26
path 330
derived_variable 0
In [6]:
# Show the first few lines of the catalog
col.df.head(10)
Out[6]:
variable standard_name long_name units spatial_domain grid spatial_resolution scenario start_time end_time frequency vertical_levels bias_correction na-cordex-models path
0 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-22i 0.25 deg eval 1979-01-01T12:00:00 2014-12-31T12:00:00 day 1 raw [ERA-Int.CRCM5-UQAM, ERA-Int.CRCM5-OUR, ERA-In... s3://ncar-na-cordex/day/hurs.eval.day.NAM-22i....
1 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-44i 0.50 deg eval 1979-01-01T12:00:00 2015-12-31T12:00:00 day 1 raw [ERA-Int.CRCM5-UQAM, ERA-Int.RegCM4, ERA-Int.H... s3://ncar-na-cordex/day/hurs.eval.day.NAM-44i....
2 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-22i 0.25 deg hist-rcp45 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 mbcn-Daymet [CanESM2.CanRCM4] s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
3 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-22i 0.25 deg hist-rcp45 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 mbcn-gridMET [CanESM2.CanRCM4] s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
4 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-22i 0.25 deg hist-rcp45 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 raw [GFDL-ESM2M.CRCM5-OUR, CanESM2.CRCM5-OUR, CanE... s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
5 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-44i 0.50 deg hist-rcp45 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 mbcn-Daymet [MPI-ESM-LR.CRCM5-UQAM, CanESM2.CRCM5-UQAM, EC... s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
6 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-44i 0.50 deg hist-rcp45 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 mbcn-gridMET [MPI-ESM-LR.CRCM5-UQAM, CanESM2.CRCM5-UQAM, EC... s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
7 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-44i 0.50 deg hist-rcp45 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 raw [MPI-ESM-LR.CRCM5-UQAM, CanESM2.CRCM5-UQAM, EC... s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
8 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-22i 0.25 deg hist-rcp85 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 mbcn-Daymet [MPI-ESM-MR.CRCM5-UQAM, GEMatm-Can.CRCM5-UQAM,... s3://ncar-na-cordex/day/hurs.hist-rcp85.day.NA...
9 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-22i 0.25 deg hist-rcp85 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 mbcn-gridMET [MPI-ESM-MR.CRCM5-UQAM, GEMatm-Can.CRCM5-UQAM,... s3://ncar-na-cordex/day/hurs.hist-rcp85.day.NA...
In [7]:
# Produce a catalog content summary.

uniques = col.unique()

columns = ["variable", "scenario", "grid", "na-cordex-models", "bias_correction"]
for column in columns:
    print(f'{column}: {uniques[column]}\n')
variable: ['hurs', 'huss', 'pr', 'prec', 'ps', 'rsds', 'sfcWind', 'tas', 'tasmax', 'tasmin', 'temp', 'tmax', 'tmin', 'uas', 'vas']

scenario: ['eval', 'hist-rcp45', 'hist-rcp85', 'hist', 'rcp45', 'rcp85']

grid: ['NAM-22i', 'NAM-44i']

na-cordex-models: ['ERA-Int.CRCM5-UQAM', 'ERA-Int.CRCM5-OUR', 'ERA-Int.RegCM4', 'ERA-Int.CanRCM4', 'ERA-Int.WRF', 'ERA-Int.HIRHAM5', 'ERA-Int.RCA4', 'CanESM2.CanRCM4', 'GFDL-ESM2M.CRCM5-OUR', 'CanESM2.CRCM5-OUR', 'MPI-ESM-LR.CRCM5-UQAM', 'CanESM2.CRCM5-UQAM', 'EC-EARTH.HIRHAM5', 'EC-EARTH.RCA4', 'CanESM2.RCA4', 'MPI-ESM-MR.CRCM5-UQAM', 'GEMatm-Can.CRCM5-UQAM', 'GEMatm-MPI.CRCM5-UQAM', 'HadGEM2-ES.RegCM4', 'GFDL-ESM2M.RegCM4', 'MPI-ESM-LR.RegCM4', 'HadGEM2-ES.WRF', 'GFDL-ESM2M.WRF', 'MPI-ESM-LR.WRF', 'CNRM-CM5.CRCM5-OUR', 'MPI-ESM-LR.CRCM5-OUR']

bias_correction: ['raw', 'mbcn-Daymet', 'mbcn-gridMET']

Load data into xarray using the catalog¶

In [8]:
data_var = 'tmax'

col_subset = col.search(
    variable=data_var,
    grid="NAM-44i",
    scenario="eval",
    bias_correction="raw",
)

col_subset

aws-na-cordex catalog with 1 dataset(s) from 1 asset(s):

unique
variable 1
standard_name 1
long_name 1
units 1
spatial_domain 1
grid 1
spatial_resolution 1
scenario 1
start_time 1
end_time 1
frequency 1
vertical_levels 1
bias_correction 1
na-cordex-models 6
path 1
derived_variable 0
In [9]:
col_subset.df
Out[9]:
variable standard_name long_name units spatial_domain grid spatial_resolution scenario start_time end_time frequency vertical_levels bias_correction na-cordex-models path
0 tmax air_temperature Daily Maximum Near-Surface Air Temperature degC north_america NAM-44i 0.50 deg eval 1979-01-01T12:00:00 2015-12-31T12:00:00 day 1 raw [ERA-Int.CRCM5-UQAM, ERA-Int.RegCM4, ERA-Int.H... s3://ncar-na-cordex/day/tmax.eval.day.NAM-44i....
In [10]:
# Load catalog entries for subset into a dictionary of xarray datasets, and open the first one.
dsets = col_subset.to_dataset_dict(
    xarray_open_kwargs={"consolidated": True}, storage_options=storage_options
)
print(f"\nDataset dictionary keys:\n {dsets.keys()}")

# Load the first dataset and display a summary.
dataset_key = list(dsets.keys())[0]
store_name = dataset_key + ".zarr"

ds = dsets[dataset_key]
ds

# Note that the summary includes a 'member_id' coordinate, which is a renaming of the 
# 'na-cordex-models' column in the catalog.
--> The keys in the returned dictionary of datasets are constructed as follows:
	'variable.frequency.scenario.grid.bias_correction'
100.00% [1/1 00:17<00:00]
Dataset dictionary keys:
 dict_keys(['tmax.day.eval.NAM-44i.raw'])
Out[10]:
<xarray.Dataset>
Dimensions:    (lat: 129, lon: 300, member_id: 6, time: 13514, bnds: 2)
Coordinates:
  * lat        (lat) float64 12.25 12.75 13.25 13.75 ... 74.75 75.25 75.75 76.25
  * lon        (lon) float64 -171.8 -171.2 -170.8 ... -23.25 -22.75 -22.25
  * member_id  (member_id) <U18 'ERA-Int.CRCM5-UQAM' ... 'ERA-Int.WRF'
  * time       (time) datetime64[ns] 1979-01-01T12:00:00 ... 2015-12-31T12:00:00
    time_bnds  (time, bnds) datetime64[ns] dask.array<chunksize=(13514, 2), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables:
    tmax       (member_id, time, lat, lon) float32 dask.array<chunksize=(4, 1000, 65, 150), meta=np.ndarray>
Attributes: (12/42)
    CORDEX_domain:                        NAM-44
    contact:                              {"ERA-Int.CRCM5-UQAM": "Winger.Katj...
    contact_note:                         {"ERA-Int.RegCM4": "Simulations by ...
    creation_date:                        {"ERA-Int.CRCM5-UQAM": "2015-06-18T...
    driving_experiment:                   {"ERA-Int.CRCM5-UQAM": "ECMWF-ERAIN...
    driving_experiment_name:              evaluation
    ...                                   ...
    intake_esm_attrs:vertical_levels:     1
    intake_esm_attrs:bias_correction:     raw
    intake_esm_attrs:na-cordex-models:    ERA-Int.CRCM5-UQAM,ERA-Int.RegCM4,E...
    intake_esm_attrs:path:                s3://ncar-na-cordex/day/tmax.eval.d...
    intake_esm_attrs:_data_format_:       zarr
    intake_esm_dataset_key:               tmax.day.eval.NAM-44i.raw
xarray.Dataset
    • lat: 129
    • lon: 300
    • member_id: 6
    • time: 13514
    • bnds: 2
    • lat
      (lat)
      float64
      12.25 12.75 13.25 ... 75.75 76.25
      long_name :
      latitude
      standard_name :
      latitude
      units :
      degrees_north
      array([12.25, 12.75, 13.25, 13.75, 14.25, 14.75, 15.25, 15.75, 16.25, 16.75,
             17.25, 17.75, 18.25, 18.75, 19.25, 19.75, 20.25, 20.75, 21.25, 21.75,
             22.25, 22.75, 23.25, 23.75, 24.25, 24.75, 25.25, 25.75, 26.25, 26.75,
             27.25, 27.75, 28.25, 28.75, 29.25, 29.75, 30.25, 30.75, 31.25, 31.75,
             32.25, 32.75, 33.25, 33.75, 34.25, 34.75, 35.25, 35.75, 36.25, 36.75,
             37.25, 37.75, 38.25, 38.75, 39.25, 39.75, 40.25, 40.75, 41.25, 41.75,
             42.25, 42.75, 43.25, 43.75, 44.25, 44.75, 45.25, 45.75, 46.25, 46.75,
             47.25, 47.75, 48.25, 48.75, 49.25, 49.75, 50.25, 50.75, 51.25, 51.75,
             52.25, 52.75, 53.25, 53.75, 54.25, 54.75, 55.25, 55.75, 56.25, 56.75,
             57.25, 57.75, 58.25, 58.75, 59.25, 59.75, 60.25, 60.75, 61.25, 61.75,
             62.25, 62.75, 63.25, 63.75, 64.25, 64.75, 65.25, 65.75, 66.25, 66.75,
             67.25, 67.75, 68.25, 68.75, 69.25, 69.75, 70.25, 70.75, 71.25, 71.75,
             72.25, 72.75, 73.25, 73.75, 74.25, 74.75, 75.25, 75.75, 76.25])
    • lon
      (lon)
      float64
      -171.8 -171.2 ... -22.75 -22.25
      long_name :
      longitude
      standard_name :
      longitude
      units :
      degrees_east
      array([-171.75, -171.25, -170.75, ...,  -23.25,  -22.75,  -22.25])
    • member_id
      (member_id)
      <U18
      'ERA-Int.CRCM5-UQAM' ... 'ERA-In...
      array(['ERA-Int.CRCM5-UQAM', 'ERA-Int.RegCM4', 'ERA-Int.HIRHAM5',
             'ERA-Int.RCA4', 'ERA-Int.CanRCM4', 'ERA-Int.WRF'], dtype='<U18')
    • time
      (time)
      datetime64[ns]
      1979-01-01T12:00:00 ... 2015-12-...
      axis :
      T
      bounds :
      time_bnds
      coordinate_defines :
      point
      delta_t :
      0000-00-01 00:00:00
      long_name :
      time
      standard_name :
      time
      array(['1979-01-01T12:00:00.000000000', '1979-01-02T12:00:00.000000000',
             '1979-01-03T12:00:00.000000000', ..., '2015-12-29T12:00:00.000000000',
             '2015-12-30T12:00:00.000000000', '2015-12-31T12:00:00.000000000'],
            dtype='datetime64[ns]')
    • time_bnds
      (time, bnds)
      datetime64[ns]
      dask.array<chunksize=(13514, 2), meta=np.ndarray>
      Array Chunk
      Bytes 211.16 kiB 211.16 kiB
      Shape (13514, 2) (13514, 2)
      Dask graph 1 chunks in 2 graph layers
      Data type datetime64[ns] numpy.ndarray
      2 13514
    • tmax
      (member_id, time, lat, lon)
      float32
      dask.array<chunksize=(4, 1000, 65, 150), meta=np.ndarray>
      cell_methods :
      time: maximum
      long_name :
      Daily Maximum Near-Surface Air Temperature
      remap :
      remapped via ESMF_regrid_with_weights: Higher-order Patch
      standard_name :
      air_temperature
      units :
      degC
      Array Chunk
      Bytes 11.69 GiB 148.77 MiB
      Shape (6, 13514, 129, 300) (4, 1000, 65, 150)
      Dask graph 112 chunks in 2 graph layers
      Data type float32 numpy.ndarray
      6 1 300 129 13514
    • lat
      PandasIndex
      PandasIndex(Index([12.25, 12.75, 13.25, 13.75, 14.25, 14.75, 15.25, 15.75, 16.25, 16.75,
             ...
             71.75, 72.25, 72.75, 73.25, 73.75, 74.25, 74.75, 75.25, 75.75, 76.25],
            dtype='float64', name='lat', length=129))
    • lon
      PandasIndex
      PandasIndex(Index([-171.75, -171.25, -170.75, -170.25, -169.75, -169.25, -168.75, -168.25,
             -167.75, -167.25,
             ...
              -26.75,  -26.25,  -25.75,  -25.25,  -24.75,  -24.25,  -23.75,  -23.25,
              -22.75,  -22.25],
            dtype='float64', name='lon', length=300))
    • member_id
      PandasIndex
      PandasIndex(Index(['ERA-Int.CRCM5-UQAM', 'ERA-Int.RegCM4', 'ERA-Int.HIRHAM5',
             'ERA-Int.RCA4', 'ERA-Int.CanRCM4', 'ERA-Int.WRF'],
            dtype='object', name='member_id'))
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['1979-01-01 12:00:00', '1979-01-02 12:00:00',
                     '1979-01-03 12:00:00', '1979-01-04 12:00:00',
                     '1979-01-05 12:00:00', '1979-01-06 12:00:00',
                     '1979-01-07 12:00:00', '1979-01-08 12:00:00',
                     '1979-01-09 12:00:00', '1979-01-10 12:00:00',
                     ...
                     '2015-12-22 12:00:00', '2015-12-23 12:00:00',
                     '2015-12-24 12:00:00', '2015-12-25 12:00:00',
                     '2015-12-26 12:00:00', '2015-12-27 12:00:00',
                     '2015-12-28 12:00:00', '2015-12-29 12:00:00',
                     '2015-12-30 12:00:00', '2015-12-31 12:00:00'],
                    dtype='datetime64[ns]', name='time', length=13514, freq=None))
  • CORDEX_domain :
    NAM-44
    contact :
    {"ERA-Int.CRCM5-UQAM": "Winger.Katja@uqam.ca", "ERA-Int.RegCM4": "Daryl Herzmann, akrherz@iastate.edu", "ERA-Int.HIRHAM5": "obc@dmi.dk", "ERA-Int.RCA4": "rossby.cordex@smhi.se", "ERA-Int.CanRCM4": "cccma_info@ec.gc.ca", "ERA-Int.WRF": "Melissa Bukovsky, bukovsky@ucar.edu"}
    contact_note :
    {"ERA-Int.RegCM4": "Simulations by Ray Arritt (deceased)", "ERA-Int.WRF": "Simulations by Hsin-I Chang, hchang@atmo.arizona.edu"}
    creation_date :
    {"ERA-Int.CRCM5-UQAM": "2015-06-18T10:56:49Z", "ERA-Int.RegCM4": "2013-11-16T21:18:29Z", "ERA-Int.HIRHAM5": "2012-09-28 12:11:10", "ERA-Int.RCA4": "2014-02-16-T00:49:00Z", "ERA-Int.CanRCM4": "2012-06-29 14:22:09", "ERA-Int.WRF": "2016-02-29T09:37:24Z"}
    driving_experiment :
    {"ERA-Int.CRCM5-UQAM": "ECMWF-ERAINT, evaluation, r1i1p1", "ERA-Int.RegCM4": "ECMWF-ERAINT, evaluation, r1i1p1", "ERA-Int.HIRHAM5": "ECMWF-ERAINT,evaluation,r1i1p1", "ERA-Int.RCA4": "ECMWF-ERAINT, evaluation, r1i1p1", "ERA-Int.CanRCM4": "ECMWF-ERAINT, evaluation, r1i1p1", "ERA-Int.WRF": "ECMWF-ERAINT,evaluation,r1i1p1"}
    driving_experiment_name :
    evaluation
    driving_model_ensemble_member :
    {"ERA-Int.CRCM5-UQAM": "r1i1p1", "ERA-Int.RegCM4": "r1i1p1", "ERA-Int.HIRHAM5": "r1i1p1", "ERA-Int.RCA4": "r1i1p1", "ERA-Int.CanRCM4": "r1i1p1", "ERA-Int.WRF": "r1i1p1"}
    driving_model_id :
    {"ERA-Int.CRCM5-UQAM": "ECMWF-ERAINT", "ERA-Int.RegCM4": "ECMWF-ERAINT", "ERA-Int.HIRHAM5": "ECMWF-ERAINT", "ERA-Int.RCA4": "ECMWF-ERAINT", "ERA-Int.CanRCM4": "ECMWF-ERAINT", "ERA-Int.WRF": "ECMWF-ERAINT"}
    experiment_id :
    evaluation
    frequency :
    day
    id :
    doi:10.5065/D6SJ1JCH
    institute_id :
    {"ERA-Int.CRCM5-UQAM": "UQAM", "ERA-Int.RegCM4": "ISU", "ERA-Int.HIRHAM5": "DMI", "ERA-Int.RCA4": "SMHI", "ERA-Int.CanRCM4": "CCCma", "ERA-Int.WRF": "UA"}
    institution :
    {"ERA-Int.CRCM5-UQAM": "Universite du Quebec a Montreal", "ERA-Int.RegCM4": "Iowa State University", "ERA-Int.HIRHAM5": "Danish Meteorological Institute", "ERA-Int.RCA4": "Swedish Meteorological and Hydrological Institute, Rossby Centre", "ERA-Int.CanRCM4": "CCCma (Canadian Centre for Climate Modelling and Analysis, Victoria, BC, Canada)", "ERA-Int.WRF": "University of Arizona"}
    model_id :
    {"ERA-Int.CRCM5-UQAM": "UQAM-CRCM5", "ERA-Int.RegCM4": "ISU-RegCM4", "ERA-Int.HIRHAM5": "DMI-HIRHAM5", "ERA-Int.RCA4": "SMHI-RCA4", "ERA-Int.CanRCM4": "CCCma-CanRCM4", "ERA-Int.WRF": "UA-WRF"}
    original_calendar :
    {"ERA-Int.CRCM5-UQAM": "gregorian", "ERA-Int.RegCM4": "standard", "ERA-Int.HIRHAM5": "gregorian", "ERA-Int.RCA4": "standard", "ERA-Int.CanRCM4": "365_day", "ERA-Int.WRF": "standard"}
    product :
    output
    project_id :
    CORDEX
    rcm_version_id :
    {"ERA-Int.CRCM5-UQAM": "v1", "ERA-Int.RegCM4": "v4.4-rc8", "ERA-Int.HIRHAM5": "v1", "ERA-Int.RCA4": "v1", "ERA-Int.CanRCM4": "r2", "ERA-Int.WRF": "v3.5.1"}
    references :
    {"ERA-Int.CRCM5-UQAM": "http://www.mrcc.uqam.ca", "ERA-Int.RegCM4": "Elguindi, et al. Regional climate model RegCM user manual version 4.4. The Abdus Salam International Centre for Theoretical Physics, Strada Costiera, Trieste, Italy. (2013).", "ERA-Int.HIRHAM5": "Christensen, et al. \"The HIRHAM regional climate model. Version 5 (beta).\" (2007).", "ERA-Int.RCA4": "http://www.smhi.se/en/Research/Research-departments/climate-research-rossby-centre", "ERA-Int.CanRCM4": "http://www.cccma.ec.gc.ca/models", "ERA-Int.WRF": "Skamarock, et al., 2005. 'A description of the Advanced Research WRF version 2.' NCAR Tech. Note NCAR/TN-468+STR"}
    title :
    {"ERA-Int.CRCM5-UQAM": "NA-CORDEX Raw NAM-44i CRCM5-UQAM ERA-Int Eval Daily Mean Daily Max 2m Temperature", "ERA-Int.RegCM4": "NA-CORDEX Raw NAM-44i RegCM4 ERA-Int Eval Daily Mean Daily Max 2m Temperature", "ERA-Int.HIRHAM5": "NA-CORDEX Raw NAM-44i HIRHAM5 ERA-Int Eval Daily Mean Daily Max 2m Temperature", "ERA-Int.RCA4": "NA-CORDEX Raw NAM-44i RCA4 ERA-Int Eval Daily Mean Daily Max 2m Temperature", "ERA-Int.CanRCM4": "NA-CORDEX Raw NAM-44i CanRCM4 ERA-Int Eval Daily Mean Daily Max 2m Temperature", "ERA-Int.WRF": "NA-CORDEX Raw NAM-44i WRF ERA-Int Eval Daily Mean Daily Max 2m Temperature"}
    tracking_id :
    {"ERA-Int.CRCM5-UQAM": "a8d17321-0e16-46eb-b8e1-a419ab582dbf", "ERA-Int.RegCM4": "52329c90-c89c-4780-b154-0244cc969507", "ERA-Int.HIRHAM5": "f02ffa94-2fe0-4d67-b98e-fa58ca57eb4d", "ERA-Int.RCA4": "4ea53413-f7e3-4135-97f3-4d4446b57431", "ERA-Int.CanRCM4": "41a6bfca-ac24-4dfd-8dce-1c5c47416ae5", "ERA-Int.WRF": "c6e0f951-7ead-4e41-8a7c-2ba34a2450d9"}
    version :
    {"ERA-Int.CRCM5-UQAM": "1.2", "ERA-Int.RegCM4": "2.1", "ERA-Int.HIRHAM5": "1.2", "ERA-Int.RCA4": "1.1", "ERA-Int.CanRCM4": "1.2", "ERA-Int.WRF": "2.3"}
    zarr-dataset-reference :
    For dataset documentation, see DOI https://doi.org/10.5065/D6SJ1JCH
    zarr-version :
    1.0
    intake_esm_vars :
    ['tmax']
    intake_esm_attrs:variable :
    tmax
    intake_esm_attrs:standard_name :
    air_temperature
    intake_esm_attrs:long_name :
    Daily Maximum Near-Surface Air Temperature
    intake_esm_attrs:units :
    degC
    intake_esm_attrs:spatial_domain :
    north_america
    intake_esm_attrs:grid :
    NAM-44i
    intake_esm_attrs:spatial_resolution :
    0.50 deg
    intake_esm_attrs:scenario :
    eval
    intake_esm_attrs:start_time :
    1979-01-01T12:00:00
    intake_esm_attrs:end_time :
    2015-12-31T12:00:00
    intake_esm_attrs:frequency :
    day
    intake_esm_attrs:vertical_levels :
    1
    intake_esm_attrs:bias_correction :
    raw
    intake_esm_attrs:na-cordex-models :
    ERA-Int.CRCM5-UQAM,ERA-Int.RegCM4,ERA-Int.HIRHAM5,ERA-Int.RCA4,ERA-Int.CanRCM4,ERA-Int.WRF
    intake_esm_attrs:path :
    s3://ncar-na-cordex/day/tmax.eval.day.NAM-44i.raw.zarr
    intake_esm_attrs:_data_format_ :
    zarr
    intake_esm_dataset_key :
    tmax.day.eval.NAM-44i.raw

Functions for Plotting¶

Helper Function to Create a Single Map Plot¶

In [11]:
def plotMap(ax, map_slice, date_object=None, member_id=None):
    """Create a map plot on the given axes, with min/max as text"""

    ax.imshow(map_slice, origin='lower')

    minval = map_slice.min(dim = ['lat', 'lon'])
    maxval = map_slice.max(dim = ['lat', 'lon'])

    # Format values to have at least 4 digits of precision.
    ax.text(0.01, 0.03, "Min: %3g" % minval, transform=ax.transAxes, fontsize=12)
    ax.text(0.99, 0.03, "Max: %3g" % maxval, transform=ax.transAxes, fontsize=12, horizontalalignment='right')
    ax.set_xticks([])
    ax.set_yticks([])
    
    if date_object:
        ax.set_title(date_object.values.astype(str)[:10], fontsize=12)
        
    if member_id:
        ax.set_ylabel(member_id, fontsize=12)
        
    return ax

Helper Function for Finding Dates with Available Data¶

In [12]:
def getValidDateIndexes(member_slice):
    """Search for the first and last dates with finite values."""
    min_values = member_slice.min(dim = ['lat', 'lon'])
    is_finite = np.isfinite(min_values)
    finite_indexes = np.where(is_finite)

    start_index = finite_indexes[0][0]
    end_index = finite_indexes[0][-1]
    return start_index, end_index

Function Producing Maps of First, Middle, and Final Timesteps¶

In [13]:
def plot_first_mid_last(ds, data_var, store_name):
    """Plot the first, middle, and final time steps for several climate runs."""
    num_members_to_plot = 4
    member_names = ds.coords['member_id'].values[0:num_members_to_plot]
    
    figWidth = 18 
    figHeight = 12 
    numPlotColumns = 3
    fig, axs = plt.subplots(num_members_to_plot, numPlotColumns, figsize=(figWidth, figHeight), constrained_layout=True)

    for index in np.arange(num_members_to_plot):
        mem_id = member_names[index]
        data_slice = ds[data_var].sel(member_id=mem_id)
           
        start_index, end_index = getValidDateIndexes(data_slice)
        midDateIndex = np.floor(len(ds.time) / 2).astype(int)

        startDate = ds.time[start_index]
        first_step = data_slice.sel(time=startDate) 
        ax = axs[index, 0]
        plotMap(ax, first_step, startDate, mem_id)

        midDate = ds.time[midDateIndex]
        mid_step = data_slice.sel(time=midDate)   
        ax = axs[index, 1]
        plotMap(ax, mid_step, midDate)

        endDate = ds.time[end_index]
        last_step = data_slice.sel(time=endDate)            
        ax = axs[index, 2]
        plotMap(ax, last_step, endDate)
        
        plt.suptitle(f'First, Middle, and Last Timesteps for Selected Runs in "{store_name}"', fontsize=20)

    return fig

Function Producing Statistical Map Plots¶

In [14]:
def plot_stat_maps(ds, data_var, store_name):
    """Plot the mean, min, max, and standard deviation values for several climate runs, aggregated over time."""
    
    num_members_to_plot = 4
    member_names = ds.coords['member_id'].values[0:num_members_to_plot]

    figWidth = 25 
    figHeight = 12 
    numPlotColumns = 4
    
    fig, axs = plt.subplots(num_members_to_plot, numPlotColumns, figsize=(figWidth, figHeight), constrained_layout=True)

    for index in np.arange(num_members_to_plot):
        mem_id = member_names[index]
        data_slice = ds[data_var].sel(member_id=mem_id)

        # Save slice in memory to prevent repeated disk loads
        data_slice = data_slice.persist()

        data_agg = data_slice.min(dim='time')
        plotMap(axs[index, 0], data_agg, member_id=mem_id)

        data_agg = data_slice.max(dim='time')
        plotMap(axs[index, 1], data_agg)

        data_agg = data_slice.mean(dim='time')
        plotMap(axs[index, 2], data_agg)

        data_agg = data_slice.std(dim='time')
        plotMap(axs[index, 3], data_agg)

    axs[0, 0].set_title(f'min({data_var})', fontsize=15)
    axs[0, 1].set_title(f'max({data_var})', fontsize=15)
    axs[0, 2].set_title(f'mean({data_var})', fontsize=15)
    axs[0, 3].set_title(f'std({data_var})', fontsize=15)

    plt.suptitle(f'Spatial Statistics for Selected Runs in "{store_name}"', fontsize=20)

    return fig

Function Producing Time Series Plots¶

Also show which dates have no available data values, as a rug plot.

In [15]:
def plot_timeseries(ds, data_var, store_name):
    """Plot the mean, min, max, and standard deviation values for several climate runs, 
       aggregated over lat/lon dimensions."""

    num_members_to_plot = 4
    member_names = ds.coords['member_id'].values[0:num_members_to_plot]

    figWidth = 25 
    figHeight = 20
    linewidth = 0.5

    numPlotColumns = 1
    fig, axs = plt.subplots(num_members_to_plot, numPlotColumns, figsize=(figWidth, figHeight))
        
    for index in np.arange(num_members_to_plot):
        mem_id = member_names[index]
        data_slice = ds[data_var].sel(member_id=mem_id)
        unit_string = ds[data_var].attrs['units']
        
        # Save slice in memory to prevent repeated disk loads
        data_slice = data_slice.persist()

        min_vals = data_slice.min(dim = ['lat', 'lon'])
        max_vals = data_slice.max(dim = ['lat', 'lon'])
        mean_vals = data_slice.mean(dim = ['lat', 'lon'])
        std_vals = data_slice.std(dim = ['lat', 'lon'])

        #missing_indexes = np.isnan(min_vals)
        missing_indexes = np.isnan(min_vals).compute()
        missing_times = ds.time[missing_indexes]

            
        axs[index].plot(ds.time, max_vals, linewidth=linewidth, label='max', color='red')
        axs[index].plot(ds.time, mean_vals, linewidth=linewidth, label='mean', color='black')
        axs[index].fill_between(ds.time, (mean_vals - std_vals), (mean_vals + std_vals), 
                                         color='grey', linewidth=0, label='std', alpha=0.5)
        axs[index].plot(ds.time, min_vals, linewidth=linewidth, label='min', color='blue')
            
        ymin, ymax = axs[index].get_ylim()
        rug_y = ymin + 0.01*(ymax-ymin)
        axs[index].plot(missing_times, [rug_y]*len(missing_times), '|', color='m', label='missing')
        axs[index].set_title(mem_id, fontsize=20)
        axs[index].legend(loc='upper right')
        axs[index].set_ylabel(unit_string)

    plt.tight_layout(pad=10.2, w_pad=3.5, h_pad=3.5)
    plt.suptitle(f'Temporal Statistics for Selected Runs in "{store_name}"', fontsize=20)

    return fig

Produce Diagnostic Plots¶

Plot First, Middle, and Final Timesteps for Several Output Runs (less compute intensive)¶

In [16]:
%%time

# Plot using the Zarr Store obtained from an earlier step in the notebook.
figure = plot_first_mid_last(ds, data_var, store_name)

plt.show()
CPU times: user 3.86 s, sys: 276 ms, total: 4.13 s
Wall time: 1min 16s

Optional: save figure to a PNG file¶

Change the value of SAVE_PLOT to True to produce a PNG file of the plot. The file will be saved in the same folder as this notebook.

Then use Jupyter's file browser to locate the file and right-click the file to download it.

In [17]:
SAVE_PLOT = False
if SAVE_PLOT:
    plotfile = f'./{dataset_key}_FML.png'
    figure.savefig(plotfile, dpi=100)

Create Statistical Map Plots for Several Output Runs (more compute intensive)¶

In [18]:
%%time

# Plot using the Zarr Store obtained from an earlier step in the notebook.
figure = plot_stat_maps(ds, data_var, store_name)

plt.show()
CPU times: user 6.11 s, sys: 371 ms, total: 6.49 s
Wall time: 1min 22s

Optional: save figure to a PNG file¶

Change the value of SAVE_PLOT to True to produce a PNG file of the plot. The file will be saved in the same folder as this notebook.

Then use Jupyter's file browser to locate the file and right-click the file to download it.

In [19]:
SAVE_PLOT = False
if SAVE_PLOT:
    plotfile = f'./{dataset_key}_MAPS.png'
    figure.savefig(plotfile, dpi=100)

Plot Time Series for Several Output Runs (more compute intensive)¶

In [20]:
%%time

# Plot using the Zarr Store obtained from an earlier step in the notebook.
figure = plot_timeseries(ds, data_var, store_name)

plt.show()
CPU times: user 8.06 s, sys: 654 ms, total: 8.72 s
Wall time: 1min 55s

Optional: save figure to a PNG file¶

Change the value of SAVE_PLOT to True to produce a PNG file of the plot. The file will be saved in the same folder as this notebook.

Then use Jupyter's file browser to locate the file and right-click the file to download it.

In [21]:
SAVE_PLOT = False
if SAVE_PLOT:
    plotfile = f'./{dataset_key}_TS.png'
    figure.savefig(plotfile, dpi=100)

Release the workers¶

In [22]:
!date
Thu May 18 11:37:29 MDT 2023
In [ ]:
cluster.close()

Show which python package versions were used¶

In [24]:
%load_ext watermark
%watermark -iv
intake       : 0.6.8
json         : 2.0.9
sys          : 3.11.3 | packaged by conda-forge | (main, Apr  6 2023, 08:57:19) [GCC 11.3.0]
dask_jobqueue: 0.8.1
matplotlib   : 3.7.1
xarray       : 2023.4.2
numpy        : 1.24.3